Data Visualization

Carlina Feldmann
Lennart Oelschläger

Version of 19.03.2023

Why and what

Welcome to this tiny course on data visualization in R with {ggplot2}! 👋

Why do we care?

Potentially, plots can beautifully inform or horribly mislead. Colors and shape matter! ⚖️

Why {ggplot2}?

The {ggplot2} package implements a grammar of graphics, a series of distinct tasks to make a graphic.

What is this course about?

Being in decent control of {ggplot2} to produce meaningful plots.

What do you need?

Basic R skills + a not-too-old version of R (>= 4.0.0) + RStudio

At the end of the day…

Sources

Found mistakes? Have suggestions?

I’m sure you have! Please leave a note here. 🙏

Our first plot

First we get {ggplot2}.

# install.packages(ggplot2)
library(ggplot2)

Next we need data, let’s go with an excerpt from the famous Gapminder dataset:

# install.packages(gapminder)
library(gapminder)
head(gapminder)
## # A tibble: 6 × 6
##   country     continent  year lifeExp      pop gdpPercap
##   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan Asia       1952    28.8  8425333      779.
## 2 Afghanistan Asia       1957    30.3  9240934      821.
## 3 Afghanistan Asia       1962    32.0 10267083      853.
## 4 Afghanistan Asia       1967    34.0 11537966      836.
## 5 Afghanistan Asia       1972    36.1 13079460      740.
## 6 Afghanistan Asia       1977    38.4 14880372      786.
str(gapminder)
## tibble [1,704 × 6] (S3: tbl_df/tbl/data.frame)
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ year     : int [1:1704] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ lifeExp  : num [1:1704] 28.8 30.3 32 34 36.1 ...
##  $ pop      : int [1:1704] 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
##  $ gdpPercap: num [1:1704] 779 821 853 836 740 ...

First, we tell the ggplot() function what data we use and what variables we wish to see on each axis:

ggplot(
  data = gapminder, 
  mapping = aes(x = gdpPercap, y = lifeExp)
) 

Something is missing … 🤔 We need an additional layer, a geom_* function!

ggplot(
  data = gapminder, 
  mapping = aes(x = gdpPercap, y = lifeExp)
) +
  geom_point()

There are more of them which we can simply add (literally add!):

p <- ggplot(
  data = gapminder, 
  mapping = aes(x = gdpPercap, y = lifeExp)
)
p <- p + geom_point() + geom_smooth()
p

As a last polishing step for now, we improve the x-axis scale and the plot labels.

p + scale_x_log10(labels = scales::dollar) +
  labs(x = "GDP per capita",
       y = "Life expectancy in years",
       title = "Economic growth as an indicator for life expectancy",
       subtitle = "Data points are country-years",
       caption = "Source: Gapminder")

Finally, we can use the ggsave() function to save our plot:

ggsave("some_descriptive_name.pdf")

Summary of the {ggplot2} workflow

  1. Call ggplot()
  2. Set data = ...
  3. Set mapping = aes(...)
  4. Add one (or more) geom_*() functions
  5. Adjust the scale and labels

Now it’s your turn

This course includes practicals! 💪

Executing the following lines gives you access to the course material:

# install.packages("devtools")
devtools::install_github("loelschlaeger/rcourse")
library(rcourse)

To start the practicals, type:

practicals()

To open a copy of these slides, type:

slides()

Facets and more geoms

Our goal is to plot the trajectory of life expectancy over time for each country in the gapminder data.

ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp)) +
  geom_line()

This look odd, we forgot to group by country! 💡

ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp)) +
  geom_line(aes(group = country))

But can you make sense of this mess? Luckily, we can additionally group by continents:

ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp)) +
  geom_line(aes(group = country)) +
  facet_wrap(~continent)

Better don’t facet_wrap(~country)… 🛑 Let’s polish our plot with the things we already learned:

ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp)) +
  geom_line(color = "grey", aes(group = country)) +
  geom_smooth() +
  facet_wrap(~continent) +
  labs(x = "Year",
       y = "Life expectancy",
       title = "Life expectancy over time on five continents")

Notice that we supplied a formula to facet_wrap. This can be more advanced, for example (with facet_grid):

ggplot(data = socviz::gss_sm, mapping = aes(x = age, y = childs)) +
  geom_point(alpha = 0.2) +
  geom_smooth() +
  facet_grid(sex ~ race) +
  labs(x = "Age",
       y = "No. of children",
       title = "Relationship between age and number of children",
       subtitle = "Separated by sex (in rows) and race (in columns)")

As a last input for this part, we learn four new geoms:

Bar plots

ggplot(data = socviz::gss_sm, mapping = aes(x = religion)) +
  geom_bar()

Using relative instead of absolute counts on the y-axis is covered in the tutorials.

Histograms

ggplot(data = socviz::gss_sm, mapping = aes(x = age)) +
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 10 rows containing non-finite values (`stat_bin()`).

There is a message and a warning. We will adress both in the practicals.

Density plots

library(dplyr)
ggplot(data = filter(gapminder, year == 2007), 
       mapping = aes(x = lifeExp)) +
  geom_density()

Boxplots

ggplot(data = filter(gapminder, year == 2007), 
       mapping = aes(x = pop,
                     y = reorder(continent, pop))) +
  geom_boxplot() +
  scale_x_log10() + 
  labs(y = NULL,
       x = "Populations in 2007")

We look at a variant on the basic boxplot that {ggplot2} offers in the tutorials.

Draw Maps

R can work with geographical data, and {ggplot2} can produce choropleth maps.

world <- map_data("world")
p <- ggplot(data = world, aes(x = long, y = lat, group = group)) +
  geom_polygon(fill = "white", color = "black")
plot(p)

Instead of the default Mercator projection, we can use the Albers projection:

p + coord_map(projection = "albers", lat0 = 15, lat1 = 45)

Now in the tutorials, we will visualize the results of the Trump vs. Clinton election 2016 on a map of the US states.

Challenge

Reproduce this plot! 😎

Don’t forget to install and load the packages {ggplot2} and {dplyr} and load the gapminder dataset. If you want to see some hints, scroll down this page.















Hint 1: Use your {dplyr} knowledge to create an extract of the gapminder dataset that only contains values from 2007.




Hint 2: Have a look at the 3rd slide of this presentation to copy the basic syntax and remember how to modify the labels.




Hint 3: You can set the size and colour of the points to depend on certain variables in the aesthetics aes().




Hint 4: Have a look at ?guide to modify the legends.

Animations

{ggplot2} itself does not allow for interactive or animated visualizations. However, there are (of course) packages to achieve this, e.g. {plotly}, {gganimate}, {shiny}.

plot <- ggplot(gapminder, aes(x = gdpPercap, y=lifeExp, size = pop, colour = continent)) +
    geom_point(alpha = 0.7) +
    scale_x_log10(labels = scales::dollar) +
    guides(size="none") +
    guides(colour=guide_legend(title="")) +
    labs(
      x = "GDP per capita", 
      y = "Life expectancy in years",
      title = "Economic growth as an indicator for life expectancy",
      caption = "Source: Gapminder"
    )
library(plotly)
ggplotly(plot)
library(gganimate)
library(gifski)
plot + transition_time(year) +
  labs(title = "Year: {frame_time}")